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Abstract 

A “cognote” system has been developed for coding electronic discussion groups and promoting 
critical thinking. Previous literature has provided an account of the strategy as applied to several 
academic settings. This paper addresses the research around establishing the inter-rater 
reliability of the cognote system. The findings suggest three indicators of reliability namely: 1) 
that raters assign similar grades to student’s discussion group contributions, 2) that raters 
predominantly assign the same cognotes to student’s discussion group contributions and 3) that 
raters are selecting in excess of 50% of the same text in assigning the same cognotes. 
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Inter-rater Reliability of an Electronic Discussion Coding System 
Constructivist learning environments are sought after by teachers of both face-to-face 
courses and internet-delivered instruction. The notion that learners best engage new ideas by 
negotiating meaning with peers in their learning group, is widely promoted by proponents of 
social constructivism (Bonk & Cunningham, 1998; Duffy & Cunningham, 1996; Newman, 
Griffin & Cole, 1989; Scardamalia, & Bereiter, 1994; Vygotsky, 1978, 1989) 

Electronic discussion is arguably the single most powerful tool for building learning 
communities in online learning. The success of electronic discussion in responding to this 
challenge, may hinge on the choices the instructor makes around assessment (MacKinnon, 

2000). The range of strategies include: 1) an open forum with no assessment and little instructor 
participation, 2) grading based solely on participation, 3) scoring rubrics that value addition of 
new ideas or responses to existing postings and 4) analytical rubrics that value higher-order 
discussion. Traditionally, instructors have found that the less-structured open forums have a 
tendency to foster less productive discussion yet allow for a certain spontaneity in students 
contributions. 

Coding Electronic Discussion for Critical Thinking 

Most recently the cognote system has been developed (MacKinnon & Aylward, 2000) for 
application to electronic discussion. The cognotes are a series of icons that represent distinct 
argumentation styles (see Table 1). Using Microsoft Word macros, these codes can be assigned 
to student’s captured discussions thereby providing feedback and acting as “critical thinking 
prompts”. The coded discussion is returned to students by email attachment. Because the 
experience has been prefaced with a set of class coding exercises, students readily recognise the 
cuing that the cognotes imply. The codes each have a grade associated with them depending on 
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the level of cognitive engagement they represent. Students accumulate grades up to maximum 
set by the instructor. Prior to the next discussion, students have the benefit of reflecting on their 
past discussion patterns and making improvements in their argumentation styles. This approach 
was taken based on the research that suggests that metacognitive exercises can promote 
academic learning (Paris & Winograd, 1990; White & Mitchell, 1994). 

The Nature of the Codes 

In order to design a system for practical use, the instructor must first decide which 
discussion patterns are most valued. This research was designed around nine categories of 
discussion based on hard copy student journal coding (Knight, 1990). These categories include: 

■ Acknowledgement of opinions (“That’s a very good point.”); 

■ Questions (“What could you do to protect the child’s self-esteem?”); 

■ Compare (“I’ve seen evidence that students behave the same in Art Class and 
Physical education.”); 

■ Contrast (“That classroom management situation is quite different because of the 
setting ...”); 

■ Evaluation (“In my opinion that argument holds little credence.”); 

■ Idea to example (“Examples of irony in Shakespeare’s MacBeth are...); 

■ Example to idea (“The main theme of the novel “Barometer Rising” is ...); 

■ Clarification/elaboration (“I think what I am hearing is that your concerned about 
gender differences, but I am wondering whether another factor may be their 
age.”); 

■ Cause and effect. (“The impact of that behaviour is of course reduced 
instructional time.”) 
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The category of discussion is associated with a simple representative graphic called a cognote. 
The graphic was created as a symbol for a toolbar in Microsoft Word®. The electronic 
discussion was copied from an html environment into Microsoft Word®. In Word, a toolbar 
wass prepared that included all of the cognotes. This constituted a template into which students 
captured discussions could be copied and then coded. A macro (prepared in Microsoft Word®) 
was then written that assigned a cognote to highlighted text within the document. The document 
was then saved with the cognotes in place and subsequently returned to the electronic discussion 
participant. 

Instructor Coding 

The first coding study (Aylward & MacKinnon, 1 999) involved three successive two- 
week electronic discussions on gender issues in science education. The instructor coded 
student’s work and returned it in between each session. In the first electronic discussion, 
students accessed many of the lower-level discussion patterns (i.e. 1 point in value rather than 2). 
However, over the three electronic discussions it was clear (see Figure 1) that students were 
participating in more substantive ways by accessing higher-order participation strategies with a 
concomitant decrease in volume of writing. Transferability of these skills to other less-structured 
electronic discussions remains an interesting question that is currently being investigated 
(Pelletier, MacKinnon & Brown, 2002). 

Student Coding 

A second study was undertaken (MacKinnon& Bellefontaine, 2000) involving students in 
a Middle School teacher education course. In this setting, students were asked to code each 
other’s work as they both participated in and coordinated an electronic discussion. Students 
participated in preliminary coding exercises that introduced the cognotes and the technology to 
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them. Each student (n=30) had their electronic discussion coded and returned by three other 
students (the coordinators of their discussion) in their class. This process was repeated in three 
successive electronic discussions around a middle school case study. Individual raters had no 
indication of what the other raters had assigned as a grade to a particular student’s electronic 
discussion contribution. The coordinators of the electronic discussion had a vested interest in 
promoting substantive electronic discussion. They were required to capture electronic discussion 
quotes and embed them in a case study report. In follow-up focus group interviews, students 
identified this process as a constructive exercise. 

Generalizability to Other Settings 

The coding system has been assessed in several teacher education settings including 
science education, middle school education, physical education and inclusive education 
(MacKinnon, Pelletier & Brown, 2002). It remains to be seen whether cognotes can be used in 
other subject areas, however a more important question has emerged from the second study 
(MacKinnon & Bellefontaine, 2000). Assuming students are prepared to assign codes (based on 
a tutorial session), can we be sure that students will access and assign the cognotes to their peers 
work in the same way? The cognotes are essentially an analytical rubric and therefore inter-rater 
reliability is a concern. The data to answer this question is available from the middle school 
course research. 

Do Students Code the Same? 

Prior to coding their peers work, students are provided with representative text and 
“mock discussions”. In the exercise, the students have an opportunity to compare their assigned 
codes to those of their peers and the instructor. The group reaches a consensual understanding of 
the cognote implications (for those particular exercises) in an effort to promote reliability. 
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In this study there were 30 students in a Middle School education course. Each of these 
students was involved in three successive and independent electronic discussions. Each 
student’s discussion was captured and coded by three other students. Students (raters) assigned a 
grade (out often) to each of three participants in a designated electronic discussion. Figure 2 
shows a random sample (n=15) of total grades for a single discussion. In this typical sample, it 
is evident (see also Table 2) that the total grade for each student, being assigned by three 
independent raters, is within one grade in most instances. It is quite reassuring that the standard 
deviation in the case of this sample and the larger sample (n=30) is relatively small (sd=0.58). 
This would suggest that raters are assigning very near the same grade to the same captured 
discussion, however this does not mean they are necessarily assigning the same codes to arrive at 
that total grade. 

To answer this question requires a closer discourse analysis (Cazden, 1988; Edwards & 
Westgate, 1994; Lemke, 1997, Young, 1992) of the empirical coding captures. From the most 
global perspective, one can simply analyse which codes are being assigned in the total discussion 
and compare that across the three raters. That analysis follows. 

Patterns in Assigning Codes: Problem Areas. 

Table 2 shows a sample of the cognotes assigned for fifteen students in a single electronic 
discussion, the third session of three consecutive electronic discussions. The data for all thirty 
students was tabulated in a similar manner and analysed for patterns. 

The following characteristics seemed most prominent in the data: 

1. Students at times seem to have trouble distinguishing the inductive versus deductive 

thinking pattern. 

2. A contribution which amounted to “posing a question” was never ambiguous. 
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3. In the instance of compare or contrast discussion patterns there was rarely ambiguity. 

4. The evaluation icon represent an unsubstantiated opinion. This sometimes was confused 
with a cause & effect pattern. 

The problems of consensual assignment of cognotes has since been addressed in 
preliminary exercises which emphasize the aforementioned ambiguities. In the case of inductive 
versus deductive patterns, informal member checks established that this was simply a “which is 
which” problem with students, thus simply requiring a reiteration of the definitions with 
accompanying examples. In the case of the evaluation icon, some students had the 
misconception that this cognote implied the “weighing” of perspectives followed by an informed 
judgement. This understanding could then in turn be easily confused with cause and effect. In 
informal interviews it became clear that the misconception arose because of the choice of icon 
graphic. To this end the icon has since been changed to better reflect the implied 
“unsubstantiated opinion” category. 

Given that the amount of coding assignment ambiguity is relatively small (primarily due 
to the preliminary comprehensive practise exercises), attention was then turned to the highlighted 
text itself. The question emerges, whether the cognotes are being assigned to exactly the same 
portions of text in the discussion. 

Are Students Coding the Same Portions of Text 

This is clearly a very difficult question to answer except by inspection and comparison. 
The following approach was taken in order to provide a cursory glimpse of the trends that may 
be emerging. In a single electronic discussion, each student’s work was coded by three raters. 

For a random sample of ten students, the coding of their contributions was analysed in a 
particular way. 
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Assume one is to examine a single student’s contributions. More specifically let us look 
at rater one’s first highlighted entry and consider the start and end of the text (hereafter SET) for 
which they assigned a particular cognote. We then compare this to the second rater’s SET for 
the first highlighted entry. We look for overlap in the SET’s selected and attempt to quantify 
this. To avoid a so-called mismatch, a decision was made to only consider SET comparisons 
when the cognote was identical for SETs that were considered overlapping (see Figure 3). One 
way of flagging the overlap is to count the number of sentences that indeed are part of the over 
lap compared to those that aren’t. In Figure 3 then, one would count two sentences of overlap 
compared to two sentences excluded. 

This rudimentary approach generated the average values shown in Table 3. All values 
are rounded to the nearest whole number. Admittedly there are inherent problems with choosing 
to quantify the overlap in this way. Nonetheless the data serves as a point of discussion if not 
rigorously generalisable. 

In this random sample of ten students from a single electronic discussion, it appears as 
though the raters are coding at least 50% of the same text blocks (SETs) with some consistency. 
What can we say overall? 

The above study suggests the following: 

1) Student raters when posed with the task of coding an electronic discussion using the 
cognote system, will typically generate grades within one mark of one another for the 
identical coded text. 

2) Raters, after three consecutive electronic discussions, tend to assign the same cognotes to 
student’s contributions with the exception of a lingering confusion between (a) inductive 
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and deductive thinking patterns and (b) the evaluation and cause and effect categories. 

These ambiguities were also addressed and corroborated in focus group sessions. 

3) A rudimentary analysis of the textual discourse suggests that raters tend to code at least 

50% of the same text for the equivalent cognote category. 

The study above attempts to establish that there is some measure of reliability in the use 
of the cognote system to evaluate electronic discussion. There is no doubt that more complete 
discourse analysis of the SET data can lend additional support for the cognote reliability. The 
challenge remains to develop valid research techniques for analysis of this unique data format. 

Implications for Further Work 

The cognotes have been used in a variety of subject areas with a notable improvement in 
electronic discussion contributions. The reliability of the instrument can be addressed more 
analytically, however early indications are that, raters tend to use the same cognotes to assess a 
discussion item and in turn arrive at a similar grade for the participant. 

All of the studies to date have involved asynchronous discussion connected with a face- 
to-face course. The obvious extension of this tool is to investigate truly online environments. 
Because good discussion is paramount to online learning, it would seem that the cognotes could 
have great potential in this educational venue. 

In an effort to promote critical thinking, this investigator chose a particular group of 
higher-order discussion patterns. It is quite possible that additional cognotes are necessary. 
Qualitative interviews of raters should confirm whether this range of cognotes was sufficient to 
account for all categories of discussion patterns. Conversely there may be sufficient overlap in 
some cognotes such that fewer cognotes categories would suffice. In early studies it was evident 
that students needed time to become accustomed to the variety of cognotes and the way in which 
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the cognotes manifested themselves in real discussions. Fewer cognotes may help students to 
“learn” the system more quickly. Again, further interviews with raters should establish whether 
this is a viable concern. 

Arguably the success of the cognote system is only truly recognised when the model is 
embedded in a constructive exercise. When the students recognise that good discussion will 
impact their work in some tangible way, the vested interest impacts the learning. Using the 
cognotes has proven to promote critical thinking patterns either through sheer practise or the fact 
that a grade is assigned to the work. As this tool is used in more settings it should become clear 
whether students retain the process skills which they acquire through use of the cognotes. 
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Table 1 



Cognote Icons and Critical Thinking Cues 



Specific Interaction 


Grade 


Coding Icon 


Acknowledgement of Opinions (evidence of participation) 


1 


a 


Question (thoughtful query) 


1 


m 


Compare (similarity, analogy) 


2 




Contrast (distinction, discriminate) 


2 


© 


Evaluation (unsubstantiated opinion/judgement) 


1 


m 


Idea to Example (deduction, analogy) 


2 


£ex 


Example to Idea (induction, conclusion) 


2 


ex<5> 

Ck!3 


Clarification. Elaboration (reiterating a point, building on a 


point) 2 


H 


Cause & Effect (inference, consequence) 


2 


H 


Off-Topic/ Faulty Reasoning (entry inappropriate) 


0 


BS 
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Table 2 

Coding Distribution Amongst Raters 



Student/Coder 


H 


m 


CD 


© 


m 


tfEx 


ex9 

CTkiffTf 


n 


54 


n 


Total Grade 


1 A 


1 


1 


2 




1 


1 










9 


IB 


1 


1 


2 




2 


1 










10 


1C 


1 


1 


2 




1 


1 










9 


2D 


2 




1 




2 




1 


1 






10 


2E 


1 




1 




3 




1 


1 






10 


2F 


1 




1 




3 




1 


1 






10 


3G 


2 






2 


2 












8 


3H 


2 






2 


1 






1 






9 


31 


2 






2 


2 












8 


4J 


1 


1 






3 


1 




1 






9 


4K 


1 


1 






3 




1 


1 






9 


4L 


1 


1 






3 


1 




1 






9 


5M 


1 




1 


1 


1 






1 




1 


8 


5N 


1 




1 


1 


1 








1 


1 


8 


50 


2 




1 


1 


1 








1 




9 


6P 


1 


1 


2 






1 




1 






10 


6Q 


1 


1 


2 






1 




1 






10 


6R 


1 


1 


2 






1 




1 






10 


7S 


1 


1 






4 








1 




8 


7T 


1 


1 






3 






1 






7 


71) 


1 


1 






3 








1 




7 


8V 


2 




1 


1 


1 








1 




9 


8W 


2 




1 


1 


1 








1 




9 


8X 


2 




1 


1 


1 








1 




9 


9Y 


1 






1 


2 


1 




1 




1 


9 


9Z 


2 






1 


2 


1 




1 




1 


10 


9AA 


1 






1 


2 


1 




1 




1 


9 


10BB 


1 








3 


2 










8 


10CC 


1 








2 


2 










7 


10DD 


1 








2 


2 










7 


1 1EE 


1 


1 


1 


2 


1 
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1 1FF 


1 


1 


1 


2 










i 


10 


1 1GG 


1 


1 


1 


2 


1 










9 


12HH 


3 


1 








1 






1 


8 


1211 


2 


1 










1 




1 


7 


12JJ 


3 


1 








1 






1 


8 


13KK 




1 


1 




1 


1 




1 




8 


13LL 




1 


1 








1 


1 




7 


13MM 




1 


1 






1 




1 




7 


14NN 


2 






1 


1 


1 








7 


1400 


2 






1 


1 


1 








7 


14PP 


3 






1 


1 


1 








8 


15QQ 


1 


2 








2 




1 




9 


15RR 




2 








2 




1 




8 


15SS 




2 








2 




1 




8 
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Table 3 



A Comparison of Raters and SET data 



Student 


Rater 

Comparison 


Average % 
Overlap 


Student 


Rater 

Comparison 


Average % 
Overlap 


1 


R1:R2 


65 


6 


R1:R2 


78 




R1:R3 


60 




R1:R3 


74 




R2:R3 


68 




R2:R3 


70 


2 


R1:R2 


75 


7 


R1:R2 


80 




R1:R3 


77 




R1:R3 


82 




R2:R3 


66 




R2:R3 


76 


3 


R1:R2 


50 


8 


R1:R2 


61 




R1:R3 


75 




R1:R3 


58 




R2:R3 


69 




R2.R3 


74 


4 


R1:R2 


50 


9 


R1:R2 


66 




R1:R3 


50 




R1:R3 


72 




R2:R3 


65 




R2:R3 


77 


5 


R1:R2 


75 


10 


R1:R2 


56 




R1:R3 


77 




R1:R3 


52 




R2:R3 


75 




R2:R3 


58 
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Figure Captions 

Figure 1. Trends in Electronic Discussion Contributions Over Three Sessions. 
Figure 2. Grading Totals for Individual Raters. 

Figure 3. Comparing SET’s of Two Raters. 
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Figure 1 



Frequency 




3rd round 
2nd round 
1st round 



Rounds 
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Total Coding Grade 
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Figure 2 




Raters 
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Figure 3 

Rater One 

The principal might consider having a mini in-service in the 
school to recap what was learned in the summer. This could entail 
small group discussions on how a succe s s f ul middle school team 
[works^. If all the teachers in the school were involved Janet might 
not feel victimised. She might begin to see the value of the new 
philosophy if all her fellow teachers share the same excitement for 
it. 

Rater Two _ _ 

The principal might consider having a mini in-service in the 
school to recap what was learned in the summer. This could entail 
small group discussions on how a successful middle school team 
works. If all the teachers in the school were involved Janet might 
not feel victimised. She might begin to see the value of the new 
philosophy if all her fellow teachers share the sam e excit em ent for 
it Ex< ? 

11 . tan 
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